Crystal Structure of de Novo Designed Coiled-Coil Protein Origami Triangle

Coiled-coil protein origami (CCPO) uses modular coiled-coil building blocks and topological principles to design polyhedral structures distinct from those of natural globular proteins. While the CCPO strategy has proven successful in designing diverse protein topologies, no high-resolution structural information has been available about these novel protein folds. Here we report the crystal structure of a single-chain CCPO in the shape of a triangle. While neither cyclization nor the addition of nanobodies enabled crystallization, it was ultimately facilitated by the inclusion of a GCN2 homodimer. Triangle edges are formed by the orthogonal parallel coiled-coil dimers P1:P2, P3:P4, and GCN2 connected by short linkers. A triangle has a large central cavity and is additionally stabilized by side-chain interactions between neighboring segments at each vertex. The crystal lattice is densely packed and stabilized by a large number of contacts between triangles. Interestingly, the polypeptide chain folds into a trefoil-type protein knot topology, and AlphaFold2 fails to predict the correct fold. The structure validates the modular CC-based protein design strategy, providing molecular insight underlying CCPO stabilization and new opportunities for the design.

R ecent developments in protein design combined with machine learning enable the design of de novo globular proteins. 1−4 Nevertheless, extensive experimental validation is still required to identify the sequences with desired structure and function. 5 An alternative strategy to protein scaffold design is to use modular building blocks with a well-understood sequence−structure relationship. 5−9 In a similar manner, DNA nanotechnology takes advantage of the modular base-pairing in the DNA duplex to design DNA nanostructures. 10,11 The principle of modular pairing also applies to some protein motifs such as coiled-coils. 12−14 Coiled-coils (CCs) associate according to well-defined pairing rules encoded in the heptad repeat pattern of abcdefg positions. 13,15 CC dimers pair with high specificity through a combination of hydrophobic and electrostatic interactions at the heptad positions a/d and e/g, respectively ( Figure 1a). 16 In the coiled-coil protein origami (CCPO) design strategy, CCs are used as modular building blocks to design protein nanostructures. 17 The desired shape is defined through the topological arrangement of parallel and/or antiparallel CC dimers arranged into a precisely defined sequential order, based on the underlying mathematical rules. 18,19 Protein folds such as tetrahedron, bipyramid, as well as multichain assemblies have been assembled using CCPO, 19−22 and even the folding pathway of those assemblies has been designed. 23 Although the CCPO has proven to be a robust strategy for the design of various protein topologies and their shape has been confirmed by electron microscopy and small-angle X-ray scattering (SAXS), no high-resolution structural information has been available for these structures. The main difficulty concerns the high flexibility and small size of CCPO structures, which makes them challenging to study using high-resolution methods such as cryoelectron microscopy and X-ray crystallography.
To address this issue, we sought to determine the crystal structure of the most elementary CCPO, the triangle. We Figure 1. Design of triangular CCPO using coiled-coil building blocks. (a) Helical wheel representation of a parallel coiled-coil with hydrophobic interactions between a/d residues and electrostatic interactions between e/g pairs. (b) To design triangular origami, three parallel coiled-coil pairs are selected and arranged sequentially in the polypeptide chain. designed a triangular protein using three orthogonal CC heterodimers concatenated in a single polypeptide chain. The triangular topology can only be achieved using parallel CC dimers since the polypeptide chain has to transverse each triangle edge in the same direction ( Figure 1b).
For the initial design, we used charged CC variants (abbreviated SN) P1:P2, P3:P4, and P5:P6 19 connected with a 5-residue linker (GSGPG) ( Table S1). TRI-6SN had CD spectra with high helical content ( Figure S1); however, no crystals could be obtained under the studied conditions. As TRI-6SN likely resists crystallization due to flexibility, we designed three variants with shorter linkers, having 1−3 residues (G, GS, GSG). Variants with 2 and 3 residue linkers expressed a high helix content and unfolded cooperatively ( Figure S1). However, no crystals were obtained.
We speculated that the termini may be responsible for the high flexibility. Therefore, we designed a cyclized variant TRI-cySN where termini were covalently linked using a transsplicing reaction based on orthogonal split-inteins ( Figure  S2). 24 While cyclization significantly increased protein thermal stability, it still did not lead to crystallization. Finally, we resorted to the ultimate strategy for the crystallization of difficult proteins and used nanobodies as crystallization chaperones. We designed the TRI-SHb variant using stabilized and helical peptides (abbreviated SHb) of P1:P2, P3:P4, and P5:P6. Since no specific nanobodies are available to bind these CCs, we applied an epitope transplantation strategy. 25 By substituting several solvent-exposed residues, we mimicked the helical epitope of the IB3 intrabody, which binds a helical segment of the huntingtin peptide. 26 In this way, we successfully introduced the IB3 binding site into P1:P2 or both P1:P2 and P5:P6 pairs ( Figure S3). However, even the triangle-nanobody complexes failed to yield any crystals.
Previously we characterized a set of specific nanobodies that recognize different CCs of the designed tetrahedron. 27 During crystallographic experiments, we observed that nanobody complexes with P5:P6 and P7:P8 CC heterodimers were difficult to crystallize compared to the homodimers, like APH 2 , GCN 2 , and BCR 2 . Based on this experience we tested whether a substitution of one CC heterodimer for a parallel homodimer would improve crystallization. We designed the variant TRI-4SHbGCN where P5:P6 is replaced by a GCN 2 homodimer, and the segments are connected with GSG linkers (Figures 2a,  S4). The purified TRI-4SHbGCN ( Figure S5) is monodisperse in solution with the molecular weight 24.5 ± 0.2 kDa, in agreement with the theoretical value, and a hydrodynamic radius of 5.5 ± 1.2 nm (Figure 2b,c). CD analysis shows around 90% helix content, exceptional thermal stability with a melting temperature > 85°C, and protein refolding ability upon cooling (Figure 2d,e). Importantly, the TRI-4SHbGCN variant crystallized in a range of different conditions.
One crystal form belonged to space group P1, diffracted to 2.05 Å, and the structure was solved using molecular replacement (PDB: 8P4Y, Table S2). The asymmetric unit contains one TRI-4SHbGCN molecule with a triangular fold as designed (Figure 3a). The triangle is nonequilateral with a shorter GCN 2 side (34 Å) and two longer (47 Å) P1:P2 and P3:P4 sides. There is an internal cavity of about 600 Å 2 . The electron density is continuous for the entire chain, except for linker sequences, where only two linkers have clear density and could be modeled in the structure. These two linkers are attached to the P4 segment, suggesting this is a more rigid part of the structure, as also reflected in the lower average B-factor for P4 ( Figure S6). Interestingly, in the linker connecting P4 to P2 both Ser100 and Gly101 become part of the P2 helix and Ser100 side-chain hydrogen bonds to Trp95 on the preceding P4 segment (Figure 3b). Thus, part of the GSG linker is integrated into the helix, leaving only Gly99 as a flexible linker residue. Our final crystallographic model is consistent with the SAXS profile of the TRI-4SHbGCN in solution (chi 2 = 1.34) and fits well into the ab initio protein envelope calculated from SAXS data (Figure 3c).
Individual CC dimers are well resolved in the structure and show the expected packing interactions between a/d residues (Figure 3d). The CCPO strategy relies on the modularity and orthogonality of CC dimers. It is therefore relevant to examine whether the incorporation of CCs into larger assemblies affects their structure. Superposition of the GCN 2 dimer as observed in CCPO with the isolated GCN 2 shows no significant changes in terms of Cα RMSD and Crick's parameters ( Figure S7, Table S3). The structure of P1:P2 and P3:P4 dimers has not been determined before, as the only available structure of CCs from this design set is that of the P5:P6-nanobody complex. 27 Pairing Asn residues at position a has a stabilizing effect and contributes to peptide orthogonality. Within the structure, we observe the formation of a hydrogen bond network involving the backbone on one side and the adjacent Glu and Lys residues on the other side ( Figure S8). Superpositions of  Table  S3). Therefore, the structures of CCs incorporated into the triangular fold remain essentially identical to the structures in isolation.
The crystal lattice is assembled by dense stacking of triangles on top of one another along the a and b unit cell axis and through end-to-end arrangement along the c axis (Figure 4a). Despite the presence of a cavity in the triangle center, the solvent content of crystals is 43.5%, below the average for this point group. 28 Each molecule interacts with 10 symmetry-related molecules via 5 unique interfaces, all of which are heterologous ( Figure  S9). Crystal packing buries 3600 Å 2 of solvent-exposed surface area per molecule, which represents about 30% of the molecular surface. While typical crystal contacts are formed via a subset of residues, we observe that a considerable number of the residues are involved in crystal contacts, mostly hydrogen bonds and salt bridges. As expected, the majority of hydrogen bond crystal contacts are formed by the residues at exposed positions b/c and f (Figure 4b); however almost an equal amount of hydrogen bond contacts is established by residues at the e/g position. Generally, e/g positions provide electrostatic complementarity between CC dimers, but here, electrostatic interactions between two e/g positions also promote crystal contacts (Figure 4c).
An unexpected feature of the structure is the interactions between the CC segments at the vertices. The contact map of TRI-4SHbGCN shows the designed interactions between orthogonal CC segments parallel to the main diagonal, while the interactions between CC pairs appear cross-diagonally (Figure 5a). For example, at vertex 1 (P1:P2/GCN 2 ) Arg23 at position c on the P1 segment stacks against Tyr150 from the second GCN segment. At the neighboring position f on P1 Arg26 hydrogen bonds to Asn149 from GCN, while Trp30 packs on top of the Leu-Leu pair of the GCN hydrophobic core (Figure 5b).  At vertex 2 (GCN 2 /P3:P4) there is a hydrogen bond network between Arg158 from the second GCN segment position c and two glutamate side chains (Glu69 and Glu73) on the P4 (Figure 5b). At vertex 3 (P3:P4/P1:P2) Arg91 from P4 forms electrostatic interactions with Glu11 from P1 and P4 Trp95 hydrogen bonds to Ser100 from the GSG linker (Figures 3d, 5b). Although the interactions between CC dimers were not intentionally designed, they likely form due to the acute angles at the triangle vertices, which bring the side chains from the CC segments into proximity.
A closer inspection of the TRI-4SHbGCN topology revealed that the chain forms a relatively shallow protein knot, known as the trefoil-type knot. 29,30 From the top view, helix segments in each dimer are approximately parallel and alternate by packing on either the inner or outer side of the triangle ( Figure S10). From the side view, the helical axis in each dimer is crossing the plane of the triangle ( Figure S10) so the first three helix segments are arranged in a triangle that intertwines with the triangle formed by the last three segments. Interestingly, AlphaFold2 31,32 is unable to predict the fold of TRI-4SHbGCN (and other CCPO structures, Figure S11), most likely due to the complex folding topology and absence of this type of fold in structural databases. CoCoPOD 19 was used to generate an ensemble of TRI-4SHbGCN models. The knot is not present in these models, and the agreement with SAXS data is systematically worse compared to the crystallographic model ( Figure S12). However, due to the SAXS resolution limit, we cannot conclusively resolve whether the knot is present also in the solution.
The presented high-resolution structure not only validates the designed CCPO topology but also reveals previously unobserved structural features such as stabilizing interactions between CC segments at vertices and integration of linkers into the CC helix while also confirming that the structure of CC dimers is unperturbed in the context of protein origami. TRI-4SHbGCN forms, to our knowledge, the smallest knot in a designed protein, occurring due to a supercoil of CCs, similar to designed knots in DNA nanostructures. 33